12 research outputs found
Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions
The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable; that is, different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class of mixtures can still produce meaningful results in practice, thus lessening the importance of the identifiability problem. We also show that the expectation-maximization algorithm is guaranteed to converge to a proper maximum likelihood estimate, owing to a property of the log-likelihood surface. Experiments with synthetic data sets show that an original generating distribution can be estimated from a sample. Experiments with an electropalatography data set show important structure in the data
A latent-variable modelling approach to the acoustic-to-articulatory mapping problem. I
We present a latent variable approach to the acoustic-to-articulatory mapping problem, where different vocal tract configurations can give rise to the same acoustics. In latent variable modelling, the combined acoustic and articulatory data are assumed to have been generated by an underlying low-dimensional process. A parametric probabilistic model is estimated and mappings are derived from the respective conditional distributions. This has the advantage over other methods, such as articulatory codebooks or neural networks, of directly addressing the nonuniqueness problem. We demonstrate our approach with electropalatographic and acoustic data from the ACCOR database
Experimental Evaluation of Latent Variable Models for Dimensionality Reduction
We use electropalatographic (EPG) data as a test bed for dimensionality reduction methods based in latent variable modelling, in which an underlying lower dimension representation is inferred directly from the data. Several models (and mixtures of them) are investigated, including factor analysis and the generative topographic mapping. Experiments indicate that nonlinear latent variable modelling reveals a low-dimensional structure in the data inaccessible to the investigated linear model
Structured Multi-Hashing for Model Compression
Despite the success of deep neural networks (DNNs), state-of-the-art models
are too large to deploy on low-resource devices or common server configurations
in which multiple models are held in memory. Model compression methods address
this limitation by reducing the memory footprint, latency, or energy
consumption of a model with minimal impact on accuracy. We focus on the task of
reducing the number of learnable variables in the model. In this work we
combine ideas from weight hashing and dimensionality reductions resulting in a
simple and powerful structured multi-hashing method based on matrix products
that allows direct control of model size of any deep network and is trained
end-to-end. We demonstrate the strength of our approach by compressing models
from the ResNet, EfficientNet, and MobileNet architecture families. Our method
allows us to drastically decrease the number of variables while maintaining
high accuracy. For instance, by applying our approach to EfficentNet-B4 (16M
parameters) we reduce it to to the size of B0 (5M parameters), while gaining
over 3% in accuracy over B0 baseline. On the commonly used benchmark CIFAR10 we
reduce the ResNet32 model by 75% with no loss in quality, and are able to do a
10x compression while still achieving above 90% accuracy.Comment: Elad and Yair contributed equally to the paper. They jointly proposed
the idea of structured-multi-hashing. Elad: Wrote most of the code and ran
most of the experiments Yair: Main contributor to the manuscript Hao: Coding
and experiments Yerlan: Coding and experiments Miguel: advised Yerlan about
optimization and model compression Mark:experiments Andrew: experiment
Practical identifiability of finite mixtures of multivariate Bernoulli distributions
The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable, i.e., different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class of mixtures can still produce meaningful results in practice, thus lessening the importance of the identifiability problem. We also show that the EM algorithm is guaranteed to converge to a proper maximum likelihood estimate, owing to a property of the log-likelihood surface. Experiments with synthetic data sets show that an original generating distribution can be estimated from a sample. Experiments with an electropalatography (EPG) data set show important structure in the data. 1 Introduction Finite mixtures of multivariate Bernoulli distributions have been extensively used in diverse fields (suc..
On Contrastive Divergence Learning
Maximum-likelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a di#erent function called "contrastive divergence" (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to fine-tune the CD solution
Are Visual Cortex Maps Optimised for Coverage?
The elegant regularity of maps of variables such as ocular dominance, orientation and spatial frequency in primary visual cortex have prompted many people to suggest their structure could be explained by an optimisation principle. Up to now, the standard way to test this hypothesis has been to generate artificial maps by optimising an hypothesised objective function, and then to compare these artificial maps with real maps using a variety of quantitative criteria. If the artificial maps are similar to the real maps, this provides some evidence that the real cortex may be optimising a similar function to the one hypothesised. However, recently a more direct method has been proposed for testing whether real maps represent local optima of an objective function (Swindale et al., 2000). In this approach, the value of the hypothesised function is calculated for a real map, and then the real map is perturbed in certain ways and the function recalculated. If each of these perturbations leads to a worsening of the function, it is tempting to conclude that the real map is quite likely to represent a local optimum of that function. In the current paper we argue that such perturbation results provide only weak evidence in favour of the optimisation hypothesis
Predicting Tongue Shapes from a Few Landmark Locations
We present a method for predicting the midsagittal tongue contour from the locations of a few landmarks (metal pellets) on the tongue surface, as used in articulatory databases such as MOCHA and the Wisconsin XRDB. Our method learns a mapping using ground-truth tongue contours derived from ultrasound data and drastically improves over spline interpolation. We also determine the optimal locations of the landmarks, and the number of landmarks required to achieve a desired prediction error: 3-4 landmarks are enough to achieve 0.3-0.2 mm error per point on the tongue